To reproduce the success of text-to-image (T2I) generation, recent works in text-to-video (T2V) generation employ large-scale text-video dataset for fine-tuning. However, such paradigm is computationally expensive. Humans have the amazing ability to learn new visual concepts from just one single exemplar. We hereby study a new T2V generation problem$\unicode{x2014}$One-Shot Video Generation, where only a single text-video pair is presented for training an open-domain T2V generator. Intuitively, we propose to adapt the T2I diffusion model pretrained on massive image data for T2V generation. We make two key observations: 1) T2I models are able to generate images that align well with the verb terms; 2) extending T2I models to generate multiple images concurrently exhibits surprisingly good content consistency. To further learn continuous motion, we propose Tune-A-Video with a tailored Sparse-Causal Attention, which generates videos from text prompts via an efficient one-shot tuning of pretrained T2I diffusion models. Tune-A-Video is capable of producing temporally-coherent videos over various applications such as change of subject or background, attribute editing, style transfer, demonstrating the versatility and effectiveness of our method.
translated by 谷歌翻译
Graph neural networks (GNNs) are susceptible to privacy inference attacks (PIAs), given their ability to learn joint representation from features and edges among nodes in graph data. To prevent privacy leakages in GNNs, we propose a novel heterogeneous randomized response (HeteroRR) mechanism to protect nodes' features and edges against PIAs under differential privacy (DP) guarantees without an undue cost of data and model utility in training GNNs. Our idea is to balance the importance and sensitivity of nodes' features and edges in redistributing the privacy budgets since some features and edges are more sensitive or important to the model utility than others. As a result, we derive significantly better randomization probabilities and tighter error bounds at both levels of nodes' features and edges departing from existing approaches, thus enabling us to maintain high data utility for training GNNs. An extensive theoretical and empirical analysis using benchmark datasets shows that HeteroRR significantly outperforms various baselines in terms of model utility under rigorous privacy protection for both nodes' features and edges. That enables us to defend PIAs in DP-preserving GNNs effectively.
translated by 谷歌翻译
只有单个目标扬声器的语音供参考的单发语音转换(VC)已成为一个热门研究主题。现有作品通常会散布音色,而有关音高,节奏和内容的信息仍然混合在一起。为了进一步删除这些语音组件,有效地执行一声VC,我们采用随机重新采样用于音高和内容编码器,并使用互信息的各种对比对数比率上限和基于梯度反向层的对抗性相互信息学习来确保不同部分在训练过程中仅包含所需的分离表示的潜在空间。 VCTK数据集的实验显示该模型就自然性和智能性方面实现了一声VC的最新性能。此外,我们可以通过语音表示分离分别传递音色,音调和节奏的单发VC的特征。我们的代码,预训练的模型和演示可在https://im1eon.github.io/is2022-Srdvc/上获得。
translated by 谷歌翻译
最初受生物神经网络(BNN)启发的人工神经网络(ANN)在许多任务(例如视觉表示学习)中取得了巨大的成功。但是,由于缺乏有效的工具来链接和互为两个不同的域,并且缺乏代表的一般有效的框架,ANN和BNN中的视觉表示之间是否存在语义相关性/连接仍然很大程度上尚未探索。 BNN中的视觉语义,例如人类功能性脑网络(FBN)。为了回答这个问题,我们提出了一个新颖的计算框架,即同步激活(同步性),以基于自然主义的功能磁共振成像(NFMRI)数据来对人脑中的ANN和BNN之间的视觉表示空间和语义进行。通过这种方法,我们能够在第一次以人类脑成像得出的生物学上有意义的描述中对神经元进行注释。我们在两个公开观看的NFMRI数据集上评估了同步操作框架。该实验证明了a)FBN中视觉表示与各种卷积神经网络(CNN)模型中的视觉表示之间的显着相关性和相似性; b)CNN的视觉表示与BNN的相似性与其在图像分类任务中的性能之间的紧密关系。总体而言,我们的研究介绍了一个一般有效的范式,以融入ANN和BNNS,并为未来的研究提供新的见解,例如脑启发的人工智能。
translated by 谷歌翻译
本文研究了动画视频的现实世界视频超分辨率(VSR)的问题,并揭示了实用动画VSR的三个关键改进。首先,最近的现实世界超分辨率方法通常依赖于使用基本运算符的降解模拟,而没有任何学习能力,例如模糊,噪声和压缩。在这项工作中,我们建议从真正的低质量动画视频中学习此类基本操作员,并将学习的操作员纳入降级生成管道中。这样的基于神经网络的基本操作员可以帮助更好地捕获实际降解的分布。其次,大规模的高质量动画视频数据集AVC构建,以促进动画VSR的全面培训和评估。第三,我们进一步研究了有效的多尺度网络结构。它利用单向复发网络的效率以及基于滑动窗口的方法的有效性。多亏了上述精致的设计,我们的方法Animesr能够有效,有效地恢复现实世界中的低质量动画视频,从而实现优于以前的最先进方法。
translated by 谷歌翻译
交互式图像恢复旨在通过调整几个控制系数来恢复图像,从而确定恢复强度。现有方法在学习已知降解类型和级别的监督下学习可控功能受到限制。当真正的降解与假设不同时,它们通常会遭受严重的性能下降。这样的限制是由于现实世界下降的复杂性,无法在培训期间对交互式调制提供明确的监督。但是,尚未研究如何实现现实世界中超级分辨率中的交互式调制。在这项工作中,我们提出了基于公制的实现现实世界超级分辨率(MM-REALSR)的交互式调制。具体而言,我们提出了一种无监督的退化估计策略,以估计现实情况下的降解水平。我们提出了一种度量学习策略,而不是将已知的降解水平作为对互动机制的明确监督,而是提出了一种度量策略,以将现实世界情景中的不可量化的降解水平映射到公制空间,该度量空间以不受监督的方式进行培训。此外,我们在度量学习过程中引入了锚点策略,以使度量空间的分布正常化。广泛的实验表明,所提出的MM-REALSR在现实世界中的超级分辨率中实现了出色的调制和恢复性能。代码可在https://github.com/tencentarc/mm-realsr上找到。
translated by 谷歌翻译
近年来,着色吸引了越来越多的兴趣。经典的基于参考的方法通常依靠外部颜色图像来获得合理的结果。检索此类示例不可避免地需要大型图像数据库或在线搜索引擎。最近的基于深度学习的方法可以自动以低成本为图像着色。但是,总是伴随着不满意的文物和不连贯的颜色。在这项工作中,我们提出了GCP颜色化,以利用预审前的生成对抗网络(GAN)封装的丰富和多样化的颜色先验进行自动着色。具体而言,我们首先通过GAN编码器“检索”匹配的功能(类似于示例),然后将这些功能与功能调制量合并到着色过程中。得益于强大的生成颜色先验(GCP)和精致的设计,我们的GCP颜色可以通过单个前向传球产生生动的颜色。此外,通过修改GAN潜在代码获得多样化的结果非常方便。 GCP颜色还继承了可解释的gan的功能,并可以通过穿过甘恩潜在空间来实现可控制和平滑的过渡。广泛的实验和用户研究表明,GCP颜色比以前的作品具有出色的性能。代码可在https://github.com/tothebeginning/gcp-colorization上找到。
translated by 谷歌翻译
自从引进原始伯特(即,基础BERT)以来,研究人员通过利用转让学习的好处,开发了各种定制的伯特模型,并通过利用转移学习的好处来提高特定领域和任务的性能。由于数学文本的性质,这通常使用域特定的词汇以及方程和数学符号,我们对数学的新BERT模型的开发对于许多数学下游任务有用。在这个资源论文中,我们介绍了我们的多体制努力(即,美国的两个学习平台和三个学术机构)对此需求:Mathbert,通过在大型数学语料库上预先培训基础伯爵模型来创建的模型预先幼儿园(Pre-K),高中,大学毕业生水平数学内容。此外,我们选择了三个通常用于数学教育的一般NLP任务:知识组件预测,自动分级开放式Q&A,以及知识追踪,以展示Mathbert对底座的优越性。我们的实验表明,Mathbert以此任务的2-8%达到了1.2-22%,碱基贝尔以前最佳方法。此外,我们建立了一个数学特定的词汇“Mathvocab”,用Mathbert训练。我们发现Mathbert预先接受过的“Mathvocab”优于Mathbert培训的底座伯特词汇(即'Origvocab')。 Mathbert目前正在参加倾斜平台采用:Stride,Inc,商业教育资源提供商和Accortments.org,是一个免费在线教育平台。我们发布Mathbert以获取公共用途:https://github.com/tbs17/mathbert。
translated by 谷歌翻译
Algorithmic fairness is becoming increasingly important in data mining and machine learning. Among others, a foundational notation is group fairness. The vast majority of the existing works on group fairness, with a few exceptions, primarily focus on debiasing with respect to a single sensitive attribute, despite the fact that the co-existence of multiple sensitive attributes (e.g., gender, race, marital status, etc.) in the real-world is commonplace. As such, methods that can ensure a fair learning outcome with respect to all sensitive attributes of concern simultaneously need to be developed. In this paper, we study the problem of information-theoretic intersectional fairness (InfoFair), where statistical parity, a representative group fairness measure, is guaranteed among demographic groups formed by multiple sensitive attributes of interest. We formulate it as a mutual information minimization problem and propose a generic end-to-end algorithmic framework to solve it. The key idea is to leverage a variational representation of mutual information, which considers the variational distribution between learning outcomes and sensitive attributes, as well as the density ratio between the variational and the original distributions. Our proposed framework is generalizable to many different settings, including other statistical notions of fairness, and could handle any type of learning task equipped with a gradient-based optimizer. Empirical evaluations in the fair classification task on three real-world datasets demonstrate that our proposed framework can effectively debias the classification results with minimal impact to the classification accuracy.
translated by 谷歌翻译
The Super-Resolution Generative Adversarial Network (SR-GAN) [1] is a seminal work that is capable of generating realistic textures during single image super-resolution. However, the hallucinated details are often accompanied with unpleasant artifacts. To further enhance the visual quality, we thoroughly study three key components of SRGANnetwork architecture, adversarial loss and perceptual loss, and improve each of them to derive an Enhanced SRGAN (ESRGAN). In particular, we introduce the Residual-in-Residual Dense Block (RRDB) without batch normalization as the basic network building unit. Moreover, we borrow the idea from relativistic GAN [2] to let the discriminator predict relative realness instead of the absolute value. Finally, we improve the perceptual loss by using the features before activation, which could provide stronger supervision for brightness consistency and texture recovery. Benefiting from these improvements, the proposed ESRGAN achieves consistently better visual quality with more realistic and natural textures than SRGAN and won the first place in the PIRM2018-SR Challenge 1 [3]. The code is available at https://github.com/xinntao/ESRGAN.
translated by 谷歌翻译